Goto

Collaborating Authors

 power and limitation


The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Neural Information Processing Systems

We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining.


On the Power and Limitations of Random Features for Understanding Neural Networks

Neural Information Processing Systems

Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient over-parameterization, gradient-based methods will implicitly leave some components of the network relatively unchanged, so the optimization dynamics will behave as if those components are essentially fixed at their initial random values. In fact, fixing these \emph{explicitly} leads to the well-known approach of learning with random features (e.g.


Power and limitations of single-qubit native quantum neural networks

Neural Information Processing Systems

Quantum neural networks (QNNs) have emerged as a leading strategy to establish applications in machine learning, chemistry, and optimization. While the applications of QNN have been widely investigated, its theoretical foundation remains less understood. In this paper, we formulate a theoretical framework for the expressive ability of data re-uploading quantum neural networks that consist of interleaved encoding circuit blocks and trainable circuit blocks. First, we prove that single-qubit quantum neural networks can approximate any univariate function by mapping the model to a partial Fourier series. We in particular establish the exact correlations between the parameters of the trainable gates and the Fourier coefficients, resolving an open problem on the universal approximation property of QNN. Second, we discuss the limitations of single-qubit native QNNs on approximating multivariate functions by analyzing the frequency spectrum and the flexibility of Fourier coefficients. We further demonstrate the expressivity and limitations of single-qubit native QNNs via numerical experiments. We believe these results would improve our understanding of QNNs and provide a helpful guideline for designing powerful QNNs for machine learning tasks.


Public-data Assisted Private Stochastic Optimization: Power and Limitations

Neural Information Processing Systems

We study the limits and capability of public-data assisted differentially private (PA-DP) algorithms. Specifically, we focus on the problem of stochastic convex optimization (SCO) with either labeled or unlabeled public data. These lower bounds are established via our new lower bounds for PA-DP mean estimation, which are of a similar form. Up to constant factors, these lower bounds show that the simple strategy of either treating all data as private or discarding the private data, is optimal. We also study PA-DP supervised learning with \textit{unlabeled} public samples.


Reviews: On the Power and Limitations of Random Features for Understanding Neural Networks

Neural Information Processing Systems

This would require a new review that cannot be performed in the current conference cycle. Regarding your answer to 3b,c this notation is really misleading if you want to mean the norm of a function then you should use f not f(x) *** Original review *** Originality. The authors focus on negative results that show some fundamental limits on what can be learned with random features, which tend to deviate from previous works that rather aim to provide positive results. Understanding the limits of random features is a valuable avenue of research. However I find that the included positive result (Theorem 3.1) is difficult to assess as a significant contribution as even the authors claim that it is not fundamentally novel and is a rather new/more compact/better proof than that of other similar results. I find there are some major flaws in some parts, or either I am misunderstanding something.


Reviews: On the Power and Limitations of Random Features for Understanding Neural Networks

Neural Information Processing Systems

This paper shows that random feature methods can not efficiently learn even a single ReLU. This has stark implications for a lot of recent work trying to explain the success of deep learning through random feature methods. The authors however should take into account the reviews and improve the writing and presentation.


The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Neural Information Processing Systems

We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with O(N 2) source data (and scarce or no target data) is as effective as supervised learning with N target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining.


Power and limitations of single-qubit native quantum neural networks

Neural Information Processing Systems

Quantum neural networks (QNNs) have emerged as a leading strategy to establish applications in machine learning, chemistry, and optimization. While the applications of QNN have been widely investigated, its theoretical foundation remains less understood. In this paper, we formulate a theoretical framework for the expressive ability of data re-uploading quantum neural networks that consist of interleaved encoding circuit blocks and trainable circuit blocks. First, we prove that single-qubit quantum neural networks can approximate any univariate function by mapping the model to a partial Fourier series. We in particular establish the exact correlations between the parameters of the trainable gates and the Fourier coefficients, resolving an open problem on the universal approximation property of QNN.


On the Power and Limitations of Random Features for Understanding Neural Networks

Neural Information Processing Systems

Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient over-parameterization, gradient-based methods will implicitly leave some components of the network relatively unchanged, so the optimization dynamics will behave as if those components are essentially fixed at their initial random values. In fact, fixing these \emph{explicitly} leads to the well-known approach of learning with random features (e.g. In other words, these techniques imply that we can successfully learn with neural networks, whenever we can successfully learn with random features. In this paper, we formalize the link between existing results and random features, and argue that despite the impressive positive results, random feature approaches are also inherently limited in what they can explain.


Uncovering the Power and Limitations of the TanH Activation Function in Neural Networks

#artificialintelligence

The TanH activation function is a commonly used activation function in neural networks. Similar to the Sigmoid function, the TanH function is particularly useful for binary classification tasks. In this post, we'll be exploring the power and limitations of using the TanH activation function in neural networks. We'll look at its unique properties, advantages, and disadvantages, and discuss some use cases where the TanH function is particularly effective. One of the main advantages of using the TanH function is that it's a zero-centered function.